The impact of information fusion in steganalysis on the example of audio steganalysis
نویسندگان
چکیده
Information fusion tries to determine the best set of experts in a given problem domain and devise an appropriate function that can optimally combine the decisions of the individual experts. Only few systematic approaches to information fusion exist so far in the signal processing field of steganalysis. Under the basic assumption that steganalysis can be seen as a statistical pattern recognition process like biometrics, a state of the art five level information fusion model known from biometrics is transferred to steganalysis as well as statistical detectability evaluations for watermarking algorithms and its applicability is evaluated in practical testing. The primary test goal for these evaluations is to measure the impact of fusion on the classification accuracy. Therefore a match and decision level fusion are performed here for three selected data hiding algorithms (one steganography and two watermarking), two feature extractors and five different classifiers. For the test heterogeneous audio test sets are used for content independent training and testing. The secondary test goal of this work is to consider the impact of the key selection assumption on the accuracy of the classification in steganalysis. The results show for the test cases an increase of the classification accuracy for two of the three tested algorithms by match level fusions, no gain by decision level fusion and a considerably small impact of the key selection assumption on the statistical detectability. 1. MOTIVATION AND INTRODUCTION Steganalysis based on statistical models is used to classify digital assets into unmodified objects and objects modified by a data hiding algorithm. Some quite mature approaches especially in the image domain not only show high classification accuracies (>99%) but also allow for message length estimations. Other domains, like the here considered audio steganalysis, have not yet reached the same degree of maturity as their image counterpart. The approach presented within this document is focusing with information fusion on a technique so far rather uncommon to steganalysis. The goal of using fusion is to improve the quality in steganalysis (measured here in classification accuracy) and thereby improve its value as a detection mechanism for hidden embedding of information into digital objects, especially in a domain like audio where few reliable detection approaches exist so far. In contrast to previous work on fusion in steganalysis (Kharrazi et al. 10 ) we focus on the question: How can the detection performance (measured in detection accuracy) on selected data hiding algorithms be improved by fusion in steganalysis? To address this question we transfer a five level fusion model from the state of the art in biometrics to the domain of audio steganalysis with the goal to increase the detection performance (instead of aiming for a stronger universality of the steganalysis approach like Kharrazi et al.) and show how the overall steganalysis process would benefit from fusion operations on the example of match and decision level fusion. For the practical implementation of the fusion a steganalysis approach which has been successfully employed in audio steganalysis 11 and audio forensics 13 in the past is combined with an approach adapted form image steganalysis. Based on this background it can be assumed that the results derived in practical testing here can also be transferred back into the field of audio forensics and thereby help to establish trust (in terms or authenticity and integrity) in digital objects. The primary test goal defined for the evaluations performed here is to measure the impact of fusion on the classification accuracy. For the evaluation of this goal a match level and a decision level fusion of the two mentioned steganalysers (AAST (AMSL Audio Steganalysis Toolset) and AudioRS) and five different classifiers is performed for three selected data hiding algorithms under the hypothesis that a complete file is either “marked” or “unmarked” by an information hiding algorithm (binary decision). The secondary test goal is to consider the impact of the key selection assumption on the accuracy of the classification in steganalysis. Both security mechanisms steganalysis and media forensics are of uttermost importance for other IT disciplines like for example secure data storage or long term archiving where establishing trust in the authenticity and integrity of communication or storage environments, as well the digital objects within these environments, is a necessity for any security concept or business model. Hidden channels within an archiving environment pose not only the imminent threat of the misuse of such a system for hidden communication but also the potential threat of steganographically inserted malicious code 9 which might later violate e.g. the authenticity or integrity of stored objects. To achieve the goal of improving the value of steganalysis as a secure and reliable detection mechanism e.g. for secure storage applications this work shows as result of the performed tests for example in match level fusion an increase of the classification accuracy for two of the three tested algorithms. Comparing the performance of the five evaluated classifiers in the match level fusion performed here, then the AdaBoost and linear logistic regression models seem to outperform the SVM, the Bayesian classifier as well as the used decision tree. The results for decision level fusion are not able to show any gain on this fusion level, indicating that a late fusion might not be the optimal choice for the used steganalysis approaches. Also a considerably small impact of the key selection assumption on the statistical detectability of the tested algorithms is shown. Here the tests show only in 16.6% of the non-fusion test cases a significant deviation in the results between the two tested key scenarios, all of them either for the decision tree or the logistic regression model. No such differences are seen in the fusion tests. The document is structured as follows: Section 2 describes the used information fusion model, which originates in biometrics and is here transferred to steganalysis for the exemplary domain of audio. All five fusion levels and the corresponding signal processing steps are introduced briefly. In section 3 the complete test scenario, including the test goals, test setup and the procedure for the practical evaluations, is described. Here the subsection containing the test setup specifies the choices for: The test sets used, the three data hiding algorithms, feature computation steps and the five exemplarily chosen classifiers (with the corresponding output normalisation/weighting strategies). Section 4 contains the test results from the practical tests and section 5 concludes the document and shows perspectives for future work. 2. PATTERN RECOGNITION, STEGANALYSIS AND FUSION If using a definition given by Bebis 2 , then pattern recognition is in general the study of how machines can observe their environment, learn to distinguish patterns of interest from their background signals and make sound and reasonable decisions about categories of the patterns. Therefore the key objectives in pattern recognition are to process the sensed data to eliminate noise, hypothesise the models that describe each class population and, given a sensed pattern, choose the best-fitting model for the assignment to the class associated with the model. From the various main pattern recognition areas 2 (template matching, statistical pattern recognition, structural pattern recognition, syntactic pattern recognition, artificial neuronal networks, etc) the approach of statistical pattern recognition is considered here for its application in steganalysis. This approach assumes that the patterns to be recognised (here the impact of the data embedding by data hiding algorithms/techniques) are represented in a feature space and tries to build a statistical model for pattern generation in this space. Figure 1 shows the general statistical pattern recognition scheme. Figure 1: General statistical pattern recognition scheme (based on Bebis) One of the research fields where applied signal processing and statistical pattern recognition are extensively employed is the fields of biometrics 18 and HCI 21 . Having emerged in the 1960s and early 1970s (see e.g. Atal 1 for biometric speaker verification/identification), biometrics achieved until now some maturity from which other (similar) pattern recognition problems like steganalysis can benefit. The idea of a knowledge transfer from biometrics to steganalysis is not a new one. One previous attempt is presented by Kharrazi et al. 10 . In their paper the authors propose to transfer a concept called information fusion from biometrics to image domain steganalysis. Fusion, which is a fairly common technique in biometrics, has the goal to determine the best set of experts in a given problem domain and devise an appropriate function that can optimally combine the decisions rendered by the individual experts 18 . Pre-processing Feature selection Learning Pre-processing Feature extraction Classification Pattern Patterns + Class labels
منابع مشابه
نهانکاوی صوت مبتنی بر همبستگی بین فریم و کاهش بازگشتی ویژگی
Dramatic changes in digital communication and exchange of image, audio, video and text files result in a suitable field for interpersonal transfers of hidden information. Therefore, nowadays, preserving channel security and intellectual property and access to hidden information make new fields of researches naming steganography, watermarking and steganalysis. Steganalysis as a binary classifica...
متن کاملImprovement of Information Fusion Based Audio Steganalysis
In the paper we extend an existing information fusion based audio steganalysis approach by three different kinds of evaluations: The first evaluation addresses the so far neglected evaluations on sensor level fusion. Our results show that this fusion removes content dependability while being capable of achieving similar classification rates (especially for the considered global features) if com...
متن کاملPros and Cons of Mel-cepstrum Based Audio Steganalysis Using SVM Classification
While image steganalysis has become a well researched domain in the last years, audio steganalysis still lacks a large scale attentiveness. This is astonishing since digital audio signals are, due to their stream-like composition and the high data rate, appropriate covers for steganographic methods. In this work one of the first case studies in audio steganalysis with a large number of informat...
متن کاملEigenvalues-based LSB steganalysis
So far, various components of image characteristics have been used for steganalysis, including the histogram characteristic function, adjacent colors distribution, and sample pair analysis. However, some certain steganography methods have been proposed that can thwart some analysis approaches through managing the embedding patterns. In this regard, the present paper is intended to introduce a n...
متن کاملA survey on digital data hiding schemes: principals, algorithms, and applications
This paper investigates digital data hiding schemes. The concept of information hiding will be explained at first, and its traits, requirements, and applications will be described subsequently. In order to design a digital data hiding system, one should first become familiar with the concepts and criteria of information hiding. Having knowledge about the host signal, which may be audio, image, ...
متن کامل